Few-Shot Learning for Rooftop Detection in Satellite Imagery

Deep Learning Tutorial

Giorgio Coppala, Nadine Daum, Elena Dreyer, Nico Reichardt

Problem Setting

  • Cities need accurate rooftop maps to plan and scale solar PV installations

  • Manual rooftop labeling is slow and costly

  • Every city looks different → traditional models do not generalize well

Idea:

  • Few-shot learning makes segmentation possible with only a handful of labeled examples

Dataset: Rooftops of Geneva

  • Satellite Images: High-resolution RGB satellite images of Geneva available on Huggingface
  • Size: 1,050 labeled image-mask pairs
  • Task: Binary segmentation masks (rooftop vs background)
  • Geographic splits: 3 grids/ neighborhoods (1301_11, 1301_13, 1301_31)
  • Image size: 250x250 pixels
  • Categories: Industrial, Residential

Few Shot Learning in General

Few-Shot Learning (FSL)

  • Learning new tasks, labels, or segmentations from very few labeled examples (N-way, K-shot)
  • Motivation:
    • Data scarcity
    • Expensive and time-consuming annotation

Few-Shot Semantic Segmentation (FSSS)

  • Goal: Segment novel object classes using only a few annotated examples
  • Assigning a class label to every pixel

Prototypical Networks (ProtNets)

  • Learn a shared embedding space
  • Pixels belonging to the same class are close in feature space
  • Class representations are formed as prototypes
  • Training follows an episodic framework
  • Each episode consists of:
    • Support set:
      • Few images with pixel-level masks
      • Defines the target classes
    • Query image:
      • Image where the model must segment the target classes

Prototypical Network Overview

Workflow

  • Support Image → Prototype → Similarity → Query Segmentation

Feature Extraction

  • Backbone: ResNet-18 CNN, pretrained on ImageNet
  • Projection: feature maps → embedding dimension (256 channels)

Evaluation Metric

\[ \mathrm{IoU} = \frac{|A \cap B|}{|A \cup B|} \]

Prototypical Network Overview

Figure 1: (Modified figure from SRPNet)

(Preliminary) Results

  • Show performance for 1-shot / 5-shot / full-data comparison

  • Show predicted masks

Wrap-Up/ Discussion

What we still want to work on:

  • Testing different kind of pretrained models as our encoder (ResNet-50, ResNet pretrained on satellite images)

  • Play around with different distance metrics (Cosine Similarity and Fidelity)

  • Evaluate different K values and see how they perform

Discussion and Key Takeaways

  • Strong 1-shot performance: even with minimal labeled data, results were impressive

  • Dataset brought its own challenges (designed primarily for PV assessment, not general segmentation)

  • Limited scope: focused only on roofs in Geneva → raises questions of generalizability

  • More diverse data or complex models could improve performance

GitHub Repo